Goto

Collaborating Authors

 strict blackbox attack transferability


Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

Neural Information Processing Systems

We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10\times$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.


Review for NeurIPS paper: Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

Neural Information Processing Systems

Weaknesses: - The first major concern is the limited methodological contribution compared to FDA. The proposed method just aggregates (i.e., sum) FDA objectives of multiple layers and adding the cross-entropy term like other attack methods; in other words, these approaches are straightforward. Although the improvements of the proposed method are meaningful, it is not surprising or interesting results. TMIM/SGM methods do not use the training data for the white-box model while FDA-based frameworks use the data for training auxiliary functions g. In my opinion, access to only pre-trained white-box models largely differs from that to whole training data, and thus the latter uses more knowledge than the former.


Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

Neural Information Processing Systems

We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a 10\times increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.